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, We study the phase diagram and the algorithmic hardness of the random 'locked' constraint 

satisfaction problems, and compare them to the commonly studied 'non-locked' problems like sat- 
isfiability of boolean formulas or graph coloring. The special property of the locked problems is 
that clusters of solutions are isolated points. This simplifies significantly the determination of the 
phase diagram, which makes the locked problems particularly appealing from the mathematical 
point of view. On the other hand we show empirically that the clustered phase of these problems 
is extremely hard from the algorithmic point of view: the best known algorithms all fail to find 
solutions. Our results suggest that the easy/hard transition (for currently known algorithms) in 
the locked problems coincides with the clustering transition. These should thus be regarded as new 
benchmarks of really hard constraint satisfaction problems. 
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I. INTRODUCTION 



■ O ' Constraint satisfaction problems (CSPs) play a crucial role in theoretical and applied computer science. Their 
^ ' wide range of applicability arises from their very general nature: given a set of N discrete variables subject to M 
constraints, a CSP consists in deciding whether there exists an assignment of variables which satisfies simultaneously 
all the constraints. When such an assignment exists we call it a solution and aim at finding it. One of the most 
important questions about a CSP is how hard it is to find a solution or prove that there is none. Many of the 
CSPs belong to the class of NP-complete problems [H, 0. This basically means that, if P^^NP, there is no algorithm 
able to solve the worst case instances of the problem in a polynomial time. Next to the question of the worst case 
computational complexity arises the less explored question of typical case complexity. A pivotal step in understanding 
] the typical case complexity is the study of random CSPs where each constraint involves a finite number of variables. 
y—( ' Pioneering work on this subject [3, |3| discovered that many problems are empirically harder close to the so-called 
— ^ [ satisfiability phase transition. This is a phase transition appearing at a critical constraint density as such that for 
I . M/N = a < as almost every large instance of the problem has at least one solutions, and for a > as almost all large 
QQ ' instances have no solution. 

, Studies of phase transitions such as the one occuring in the satisfiability problem are natural for statistical physicists. 
Indeed the methods developed to study frustrated disordered systems like glasses and spin glasses [1] have turned 
out to be very fruitful in the study of several CSPs. In particular they allow some structural studies which aim at 
understanding how the difficulty of a problem is related to the geometrical organization of its solutions. Several other 
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H phase transitions were described in this context. The most important one is probably the clustering transition 
. . . . known as the dynamical glass transition in the mean field theory of glasses. It was computed that in the region 
where the density of constraints is below the satisfiability threshold there exists a phase where the space of solutions 
splits into ergodically separated groups - clusters. Another important property of the clusters concerns the freezing 
of the variables. A variable is frozen in a cluster if it takes the same value in all the solutions of this cluster. It has 
been conjectured that the clustering ^ and the freezing of variables 0] are two ingredients which contribute to make 
a random CSP hard. But the predictions for the easy/hard transition in a general random CSP are still not fully 
quantitative. The present work provides further insight into this subject. 

In this paper we present a detailed study of the locked CSPs, introduced recently in (To| . The special property of 
the locked problems is that clusters are point-like: every cluster contains only one solution. Therefore, as soon as 
the system is in a clustered phase, all the variables are frozen in each cluster. The clustering and the freezing phase 
transitions occur simultaneously. Consequently the organization of the space of solutions is much simpler than in 
the commonly studied K-satisfiability or graph coloring [ll], [l^, [3] ■ But at the same time, and unlike in the 
K-satisfiability or graph coloring problems, the whole clustered phase is extremely hard for all existing algorithm and 
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the clustering/freezing threshold seems to coincide very precisely with the onset of this hardness. 
The interest in the locked problems is thus twofold: 

(a) Locked problems are very simple: As the clusters of solutions are point-like many of the quantities of 
interest can be computed using simpler tools than in the canonical K-satisfiability problem. This is in particular 
interesting from the mathematical point of view, because several of their properties become accessible to rigorous 
proofs. From a broader point of view the locked problems should be useful as simple models of glass forming 
liquids because their phase diagram can be studied without any need to introduce the complicated scheme of 
'replica symmetry breaking' Q . 

(b) Locked problems are very hard: From the algorithmic point of view the whole clustered phase of the locked 
problems is extremely hard, none of the known algorithms is able to find solutions efficiently. This suggests to 
use locked CSPs as hard benchmarks. At the same time one may hope that the performance of some algorithms 
will be simpler to analyze when they are applied to the locked problems, compared to the general case. 

This paper is organized as follows: In section |TT] we define the random occupation problems and the random locked 
occupation problems (LOPs) on which we will illustrate our main findings. In section IIIII we write the equations 
needed to describe the phase diagram of the occupation problems, using well known tools from statistical physics and 
probability theory. In section IIVI we summarize the basic properties of the phase diagram in general random CSPs 
and then discuss in detail the situation in the locked problems. We also discuss the class of so-called balanced LOPs 
which are even simpler from the mathematical point of view. Finally section |V| shows our findings about algorithmic 
performance in the occupation problems: empirical data using the best known random CSP solver - belief propagation 
reinforcement - indicates that the clustering threshold is close to the boundary between the easy and hard regions. 
We analyze also the non-locked occupation problems for comparison. A short summary of the results and perspectives 
conclude the paper in section IVTl 

II. DEFINITIONS 

A. Locked occupation problems 

We shall study a broad class of problems called ^occupation problems'. An occupation problem involves N binary 
variables Si G {0, 1} (si = is referred to as "site i is empty", and Si = 1 is "occupied") and M constraints, indexed 
by 6 £ {1, . . . , Af}. Each constraint b involves Kf, distinct variables, and is defined by a 'constraint wor d' with 
Kb + 1 bits, which we write as = AqA\ . . . A\^, where A\ e {0, 1}. We denote by db the indices of all variables 
involved in the constraint b. The constraint b is satisfied if and only if the sum r = X^ieOfj of all its variables is such 
that A^ = 1. In other words, in order for constraint b to be satisfied, one needs that the number of occupied sites, r, 
in its neighborhood, must be such that A^ — 1 (this unified notation for the occupation problems was introduced in 

Definition: An occupation problem is locked if and only if: 

(a) For every constraint b E {1, . . . , M}, the vector A^ is such that, for all i = 0, . . . , if — 1: A^A^^-^ = . 

(b) Every variable appears in at least two different constraints. 

In this paper, we shall study only 'constraint-regular' problems in which all of the constraints are described by the 
same constraint word: for all & G {!,..., M}, Ki, = K and A^ = A. Furthermore, in order to focus onto difficult cases, 
we shall only consider the occupation problems where neither the totally empty nor the totally occupied configurations 
are solution, i.e. we keep to the cases where Aq = Ak = 0. It is convenient to use the factor graph description of a 
problem where sites and constraints are vertices, and an edge connects a constraint a to a site i whenever 

i appears in constraint a (see Fig. [T|) . An instance of a constraint-regular occupation model is fully described by its 
factor graph (where all constraint vertices have degree K) and the K + 1 component vector A. The locked problems 
are thus characterized by the facts that (i) there are no consecutive '1' in the word A ~ AqAi . . . Ak, and (ii) their 
factor graph has no leaves. 

Well-studied examples of occupation problems include: 

• Ising anti-ferromagnet: A — Q\Q 

• Odd parity checks (anti- ferromagnetic if-spin model, with K even): A = 01010 . . . 1010 



• Positive 1-in-K SAT (exact cover): A = 0100 . . .00 
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• Perfect matching in ivT-regular graphs: each variable belongs to two constraints and A = 01000 . . .00 [l^ 

• Bicoloring (positive NAE-SAT): A = Dili ... 110 [H, S [SI 

• Circuits going through aU the points: A = 001000 . . .00 ^ 

All these examples, except the bicoloring, are locked on graphs without leaves. 

For the occupation problems which have not been studied previously, we will use names derived in the following 
way: A = 010100 is the l-or-3-in-5 SAT, A = 010010 is the l-or-4-in-5 SAT, etc. 




FIG. 1: A factor graph representation of an instance of the l-or-3-in-5 SAT {A = 010100). The squares are the constraints. 
Full/empty circles are occupied/empty sites. The two parts show two examples of satisfying assignments - "solutions" - of 
this instance. As there are no leaves (each variable belongs to at least two constraints) and A satisfies AiAi+i = for all 
i = 0, . . . , 4, this instance is locked. 

From the computational complexity point of view, Schaefer's theorem (23| implies that most of the occupation 
problems are NP-complete. The exceptions are the parity checks, which amount to linear systems of equations on 
GF{2), and some of the cases where the variables have degree 2, such as for instance the perfect matching. 

The crucial property of the locked occupation problems is that in order to go from one solution to another one must 
flip at least a closed loop of variables. This property can be used to generalize the definition of a locked problems to 
a much wider class of constraint satisfaction problems than the occupation problems, and in particular the variables 
do not need to be binary. Some examples of locked problems which are not occupation problems are the XOR-SAT 
(p-spin) problem on factor graphs without leaves [2J|, or all the uniquely extensible models (25j . 



B. Ensembles of random occupation problems 



We shall study some random ensembles of locked occupation problems, in which the factor graph is chosen from 
some ensemble of random bipartite graphs. We consider constraint-regular occupation problems where each constraint 
involves K variables, and is characterized by the constraint word A. An ensemble is characterized via a probability 
distribution Q{1). To create an instance of the random occupation problem with N variables, we draw N independent 
random numbers U from the distribution Q{1), with the additional constraint that X]i=i h/K = M is an integer. The 
factor graph that characterizes an instance is then chosen uniformly at random from all the possible graphs with TV 
variables, and M constraints, such that, for all i = 1 . . . N, the variable i is connected to k constraints. 

In this paper we will consider mainly two degree distributions: 

• Regular: Q{1) = 5ix, in which all the variables take part in L clauses. 

• Truncated Poissonian: 

Q(0) = Q(1) = G, Qil) = for l>2 (1) 

1 — (1 + c) e l\ 

where c > 0. The average "connectivity" (variable degree) is then 

1 - e~'' 

l = c ^ ' . . (2) 

1 — (1 + c) e ^ 

In the cavity method one also needs the excess degree distribution q{l), defined as the distribution of the number 
of neighbors on one side of an edge chosen uniformly at random: 

9(0) = 0, ,(0 = ^^. (3) 
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We shall be interested in the properties of large instance, i.e. in the 'thermodynamic limit' where one sends iV — > cxd 
and M oo, keeping K and Q{1) fixed; this results in a fixed density of constraints M/N = l/K. Our main results 
are easily generalizable to any degree distribution Q{1) which has a finite second moment. For every such distribution, 
a typical factor graph is locally tree-like: the shortest loop going through a typical variable has a length which scales 
as logA'^. The crucial property of the locked occupation problems is that, in order to go from one solution of the 
problem to another solution, one must flip at least one closed loop of variables. On the random locally tree- like factor 
graphs this means that at least logA^ variables need to be changed. 



III. THE SOLUTION OF RANDOM OCCUPATION PROBLEMS 

The cavity method [l^ is nowadays the standard tool to compute the phase diagram of random locally tree-like 
constraint satisfaction problems. Depending on the structure of the space of solutions of the problem, different versions 
(levels of the replica symmetry breaking) of the method are needed. In this section we state the cavity equations for 
the occupation problems. For a detailed derivation and discussion of the method see 

We index the variables by i, j. A:, . . . going from 1 to N , and the constraints by a,b,c, . . . going from I to A/. The 
energy of the occupation problems then reads 

M 

ff(M)-E'5A^..a„=,.0, (4) 
a=l 

In this paper we shall study only the instances where solutions (ground states of zero energy) exist, and we shall focus 
on the uniform measure over all solutions. 



A. The replica symmetric solution 




A: 



FIG. 2: Part of the factor graph to illustrate the meaning of indices in the belief propagation equations (ISaHSbp 



The replica symmetric version of the cavity method is also known under the name belief propagation |15|, UM, |27| . 
It exploits the local tree-like property of the factor graph, assuming that correlations decay fast enough. The basic 
quantities used in this approach are messages. We define as the probability that the constraint a is satisfied, 

conditioned to the fact that the value of variable i is s^. Similarly, xij*"" is the probability that the variable j takes 
value Sj conditioned to the fact that the constraint a has been removed from the graph. The messages then satisfy 
the belief propagation (BP) equations 



n ^'sr 



(5a) 
(5b) 



where Z'^^'^ and Z^^^ are normalization constants. Fig. [5] shows the corresponding part of the factor graph. The 
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marginal probabilities ("beliefs") are then expressed as 



The replica symmetric entropy (logarithm of the number of solutions, divided by the system size) then reads 



(6) 



(7) 



where 



{si} i^da \bGdi — a / 



(8a) 
(8b) 



are the exponentials of the entropy shifts when the node a and its neighbors (resp. the node i) is added. 

When one considers an ensemble of random graphs, the probability distribution of the messages can be found via the 
population dynamics technique [20] . Moreover, on the regular graph ensemble or for some of the balanced problems 
(see Sec. IIV C|) the solution is factorized. In the factorized solution the messages x*""^! ■0"^* are independent of the 
edge (ia) and the replica symmetric solution can thus be found analytically. 

For instance in the regular graph ensemble where each variable is present in L constraints the factorized solution is 



^1 



— <JA^+iS 



^rcg 



(9a) 
(9b) 



where the normalization Z'^^^ is fixed by the condition '00 + V"! = 1- Given the solution ■0o, tpi of (|9all9b|) . the entropy 
reads 



Srcg = log 



K 



EC (K\ ,(L-l)r ,(L-l)(K-r) 



.r=0 



(L-l)log [^i+^Y. 



(10) 



B. Reconstruction on trees 



Treating the locally tree-like random graph as a tree fails if long range correlations are present in the system. More 
precisely [ll, [ll] the replica symmetric assumption is correct if and only if the so-called point-to-set correlations do 
decay to zero. The decay of these correlations is closely related [1^ to the problem of reconstruction on trees [l^ 
which we explain and analyze in this section. 

The reconstruction on trees is defined as follows: First construct a tree of d generations having the same connectivity 
properties as a finite neighborhood of a random variable in the random factor graph. Assign the root a random value, 
further assign values iteratively on the descendants uniformly at random but in such a way that the constraints are 
satisfied. Subsequently forget the assignment everywhere but on the leaves of the tree. The reconstruction on the tree 
is possible if and only if the information left in the values of the leaves about the value of the root does not go to zero 
as the size of the tree grows, d — > oo. The replica symmetric assumption is correct if and only if the reconstruction is 
not possible, in other words if there is no correlation between the root (point) and the leaves (set). Typically, when 
the average connectivity of variables is small the reconstruction is not possible and when the connectivity is large 
the reconstruction is possible. The threshold connectivity is called the reconstructibility threshold, or the clustering 
transition. The clustering is then defined as a minimal decomposition of the space of solutions such that within the 
components (clusters) the point-to-set correlations do decay to zero [12 1. 

It was shown in [28| that the analysis of the reconstruction on trees is equivalent to the solution of the one-step 
replica symmetry breaking (IRSB) equations at the value of the Parisi parameter m = 1 [261] . 



6 



Instead of the general form of the IRSB equations at m = 1 (see e.g. [30]), we shall only discuss here a simpler 
form called the naive reconstruction in (3l| . In general the naive reconstruction gives only an upper bound on the 
reconstructibility threshold, but in the locked problems it gives in fact the full information. The naive reconstruction 
consists in computing the probability that the value of the root is uniquely implied by the leaves (boundary conditions). 
Here we give the equations only for regular graph ensembles with variables of connectivity L, where the factorized 
replica symmetric solution ((9|) holds. Define fii (resp. fio) as the probability that a variable which in the broadcasting 
had value 1 (resp. 0) is uniquely determined by the boundary conditions. One has: 



k si 



Ml 



Mo 



^ E Sa^^ui5a,..o 9k{r) J2 ( J [1 - (1 - ^^oY] [l - (1 - Mi)']''"' (1 - Mi)'^ , (Ha) 
^ E '^^.+i,o<^A.,i 9k{r) E f'' 7) [1 - (1 - Ml)'] [1 - (1 - Mo)'] (1 - Mo)'^ , (lib) 

r=0 s=0 ^ ' ^ 

(k 



where I — L — \, k = K — 1, and 5fe(r) ~ (|;) (V'i)''^('/'o)'*-'^ ^'^ ■ The indices si, sq in the second sum of both equations 
are the largest possible but such that si < r, sq < K — 1 — r, and J2T=o ^r-s = 0, J2T=q ^r+i+s = 0. The values ipo, 
■01 are the fixed point of eqs. (|9all9bp . and is the corresponding normalization. 

These lengthy equations have in fact a simple meaning. The first sum is over the possible numbers r of occupied 
variables on the descendants in the broadcasting. The sum over s is over the number of variables which were not 
implied by at least one constraint but the configuration of implied variables nevertheless implies the outcoming value. 
The term 1 — (1 — //)' is the probability that at least one constraint implies the variable, (1 — /i)' is the probability 
that none of the constraints implies the variable. 



C. Survey Propagation 



Survey propagation is a special form of the IRSB equations corresponding to the value of the Parisi parameter 
m = 0. The main assumption of the IRSB approach is that the space of solutions splits into clusters (pure states). 
To each cluster corresponds one fixed point of BP equations. Survey propagation are then iterative equations for the 
following probabilities (surveys) 



prob(xr" = i,xr" = o)=pr", 
prob(xr" = o,xr" = i)=pr", 
pr" = i-pr"-pr", 



prob(^r* = i,0S^' = o) = 9r 

Prob(0r^=O,^o"^' = l) = C 

gr ' = 1 - * - 'zs^\ 



(12a) 
(12b) 
(12c) 



where QiJ^^ is probability over clusters that clause a is satisfied only if variable i takes value 1/0, is then the 

probability that clause a can be satisfied by both values 1 and 0, similarly p\~^q is probability that variable i have 

to take value 1/0 if the clause a is not present, pl~^'^ is probability that the variable i can take both values 1 and 
when clause a is not present. The survey propagation equations are then written in two steps, first the update of p's 
knowing g's 



7— >a 
Pi 



7— i-a 

Pa 



1 



1 



n ii'r'+i'r')- n 1*^' 

b^dj — a 6 G dj — a 



P^r^ -- 

second the update of g's knowing p's 



n 



b^dj — a b^dj—a 



n 'i^' 



(13a) 

(13b) 
(13c) 



bGdj — a 



{rj} jeda-i 



(14a) 
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Here M^^"' and M"'^'^ are normalization constants, the indices s and Vj are in {1,0,*}. The function Ci/Cq (resp. 
C*) takes values 1 if and only if the incoming set of {rj} forces the variable i to be occupied/empty (resp. let the 
variable i free), in all other cases the C"s are zero. More specifically, let us call ni, ng, n* the number of indices 1,0,* 
in the set {rj} then 

• Ci = 1 if and only if ^„^+„^+i = 1 and A„^+„ = for all n = . . . n*; 

• Co = 1 if and only if — 1 and yl„j_|_i_(-„ = for all n = . . . n,; 

• C* = 1 if and only if there exists m,n £ {0 . . . n,} such that yl„j_|_„ = An-^+m+i = 1- 



D. The first and the second moment 



In this section we give the formulas for the first and second moment method in general occupation problems. This 
allows for a direct probabilistic study of the balanced locked occupation problems introduced below in Sec. IIV CI 

For a given instance (or factor graph), G, define as TVg the number of solutions. The first moment is the average 
of A/g over the graph ensemble, which can also be written as: 

(A^g) =^Prob({CT}isSAT) . (15) 
{^} 

The 'annealed entropy' is then defined as Sann = log (A/'g)/A'^. It is an upper bound on the quenched entropy, 
(logA/c)/-^- In order to compute the first moment we divide variables into groups according to their connectivity 
and in each group we choose a fraction of occupied variables. The number of ways in which this can be done is then 
multiplied by the probability that such a configuration satisfies simultaneously all the constraints. After some algebra 
[soi l we obtain the entropy of solutions with a fraction < i < 1 of occupied variables: 

K 



where u{t) is the inverse of 



I 

The annealed entropy is then Sann = niaxtSann(t). 

The second moment of the number of solutions is defined as 



{■^g)= Y] Prob({CTi}and{cr2}arebothSAT) . (18) 



Wl},{<^2} 

The second moment entropy is then defined as S2nd = '^og (JVq) / N . The Chebyshev's inequality gives then a lower 
bound on the satisfiability threshold via 

Prob(AAG>0)>i^. (19) 

The second moment is computed in a similar manner as in [Tol . Is^ . First we fix that in a fraction tx of nodes the 
variable is occupied in both the solutions ai , (72 in (fT8|) . In a fraction ty the variable is occupied in ai and empty in 
(72 and the other way round for t^- We sum over all possible realizations of < tx,ty,tz such that X]t«=a: y — ^■ 
This is multiplied by the probability that the two configurations iJi,(T2 both satisfy all the constraints. After some 
algebra we obtain [30| : 

S2nd{tx,ty,t^) = ^\0gPA{tx,ty,t^) +^Q{l)\ogll+ ^ [Uni{tx,ty,t^)f ^ , (20) 

I I 'u;G{a;,y,2:} 

where Uuj{tx, ty,tz), w G {x, y, z}, are obtained by inverting the three equations: 



w = x,y,z, 



(21) 
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and the function pA(tx, ty,tz) is defined as 



a' 

K min(ri,r2) 



ri,r2=0 s=max{0,ri+r2—K) 

y 



K \ f t. 



, \ (ri-s) / J- \ ir2-s) 



in - s){r2 - s) sj \u^{tx,ty,t^) ^ 

{l~U-ty-t,)^''-''---+'K (22) 



The second moment entropy is the global maximum: S2nd — maxt^^t ,t^S2nd{tx,ty,tz). 

For the regular graphs Q{1) — Six the expressions for both the first and second moment simplify considerably. For 
the first moment, the inverse of ITTI) is explicit u = \t/{l — i)]^^^ and thus 

L. {^/K 



, r=0 



For the second moment the function (|2ip is also explicitly reversible and the second moment entropy for regular 
graphs reads 

T ( K min(ri,r2) r 



K ^] ^ ^ (n - s)! (r2 -s)!s! (i^- ri - ra + s)! 



L-l 

(K—ri-r2-\-s) 



(24) 



IV. THE PHASE DIAGRAM 
A. Non-locked occupation problems 

The phase diagram of the non-locked occupation problems that we have explored is qualitatively similar to the one 
of K-satisfiability and graph coloring studied recently in detail in [§, [12, H^l ■ We thus only briefly summarize 

the main findings in order to be able to appreciate the difference between the locked and the non-locked problems. 

As one adds constraints to a typical non-locked problem the space of solutions undergoes several phase transitions. 
When the density of constraints is very small the replica symmetric solution is correct and most of the solutions lie 
in one cluster. As the density of constraints is increased, the point-to-set correlations, defined via the reconstruction 
on trees, no longer decay to zero. This is the clustering transition, at this point the space of solutions splits into 
exponentially many well separated (energetically or entropically) clusters. But as long as an exponential number 
of such clusters is needed to cover almost all the solution the observables like entropy, magnetizations, two point 
correlations, etc. behave as if the replica symmetric solution was still correct. This phase is called the dynamical 
IRSB. When the constraint density is further increased the space of solutions undergoes the so-called condensation 
transition. In the condensed phase only a finite number of clusters is needed to cover an arbitrarily large fraction of 
solutions. Increasing again the density of constraints, one crosses the satisfiability transition where all the solutions 
disappear. 

We remind at this point that in the non-locked occupation problems, where the sizes of clusters fluctuates, the 
survey propagation equations are not equivalent to the reconstruction on trees. More technically said the IRSB 
solutions at m = and at to = 1 are different, for example a non-trivial solution appears at different connectivities. 

A second class of important phase transitions in the space of solutions of the non-locked problems concerns the 
so-called frozen variables, which might be responsible for the onset of algorithmic hardness [9]. A variable is frozen in 
a cluster if in all the solutions belonging to that cluster it takes the same value. A cluster is frozen if a finite fraction of 
variables are frozen in that cluster. A solution is frozen if it belongs to a frozen cluster. As the number of constraints 
is increased the clusters tend to freeze. We define two transition points. The first one, called the rigidity transition 
0, is defined as the point where almost all solutions become frozen. The second one, the freezing transition, is defined 
as the point where strictly all solutions become frozen. 

In the cavity method every cluster is associated with a solution of the BP equations. A frozen variable i is described 
by a marginal probability (jB]) which is either equal to (xoiXi) = (IjO) oi' to (XoiXi) = !)■ The rigidity transition 
is then computed as the connectivity at which such "frozen beliefs" x appear in the dominating clusters. If this 
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transition happens before the condensation transition then it is given by the onset of a nontrivial solution to the naive 
reconstruction, eg. pT|) . The rigidity transition was computed for the graph coloring in [3l| , in the bicoloring of 
hyper-graphs [HLor the K-SAT in [l^jlsij. The freezing transition was studied with probabilistic methods in K-SAT 
with large K iii|33| and numerically in 3-SAT in [s^l • 

B. Locked occupation problems 

Point-like clusters — The main property which makes the locked problems special is that every cluster consists 
of a single configuration and has thus zero internal entropy. One way to show this is realizing that in the locked 
problems if {si} is a satisfying configuration then 

0, (25a) 
= (25b) 

is a fixed point of BP eqs. (|5all5bp . The corresponding entropy is then zero, as = = 1 for all i, a. In the derivation 
of the fixed points of the belief propagation equations correspond to clusters. Thus in the locked problems every 
solution may be thought of as a cluster. Such a situation was previously encountered in a few problems [H, [s^, [37| 
and called the frozen IRSB because all the variables, clusters and solutions are frozen in such a case. 

The clustering transition — In terms of the reconstruction on trees the situation in the locked problems is 
trivial because the boundary conditions on leaves always imply uniquely all the variables in the body of the tree and 
also the root. However one may ask what happens if the assignment of a small fraction of the variable on leaves is 
also forgotten - we call this the small noise reconstruction on trees (STj . In the non- locked problems nothing changes. 
In the locked problems the small noise reconstruction is not equivalent to the reconstruction. At sufficiently small 
connectivities the small noise reconstruction is not possible, that is if we introduce a small noise in the leaves all the 
information about the root is lost. In the same spirit: we showed that every solution corresponds to a fixed point of 
the belief propagation of the type (j25p . but we did not ask if such a fixed point is stable under small perturbations. If 
an infinitesimal fraction of messages in (|25p is changed, will the iterations (|5aj|5bp converge back to the unperturbed 
fixed point or not? Again for sufficiently small connectivity it will not. This leads us back to a definition of the 
clustering transition which needs to be refined for the locked problems. 

We thus define the clustering transition as the threshold for the small noise reconstruction. As all the clusters are 
frozen the reconstruction problem is equivalent to the naive reconstruction which deals only with the frozen variables. 
So for example on the ensemble of random regular graphs it is sufficient to investigate the stability of the solution 
/^o = Ml = 1 of eqs. (|llamib)) under iteration. It is immediate to see that if L > 3 then the solution jii ^ jiQ = 1 
of (|llamib|l is always iteratively stable. When L = 2 we observed empirically that the solution /^i = — 1 is not 
stable and the only other solutions is /ii = /ip = 0. Thus in the regular graphs ensemble of the locked problems the 
clustering transition is at L = 3. 

For a general graph ensemble it is simpler to realize that as the internal entropy of clusters is zero the IRSB 
solution does not depend on the value of the Parisi parameter m. Thus in particular the small noise reconstruction 
is equivalent to the iterative stability of the BP-like fixed point of the survey propagation equations. 

We have found that, in the locked occupation problems, the SP equations (|13m4p . when initialized randomly, have 
two possible iterative fixed points: 

• The trivial one: (7°^* — = 1, q1^'^ = p\^"- = Qo^' = Pq^" = for all edges {ai). 

• The BP-like one: g"^* = plT"" = 0, q"-^^ = pi-^a _ ^^^a edges (ai), where ip and x is the solution 
of the BP equations (|5aj|5bp found with high probability by iterating the equations from a uniformly random 
initial condition. 

The small noise reconstruction is then investigated, using the population dynamics, from the stability of the BP-like 
fixed point under iteration. If it is stable then the small noise reconstruction is possible and the phase is clustered. 
If it is not stable then we are in the liquid phase. From a geometric point of view, we conjecture that in the liquid 
phase the Hamming distance separation between solutions grows only proportionally to logiV; on the contrary, when 
small noise reconstruction is possible we expect this Hamming distance to be extensive (proportional to N). 

The satisfiabiUty threshold — The BP eqs. (|5all5b|) have many fixed points. However, when we solve them 
iteratively starting from a random initial condition we always find the same fixed point which does not correspond to 
a satisfying assignment (I^S)) . We call this fixed point and its corresponding entropy the replica symmetric solution. 
It should actually be thought of as a fixed point of the survey propagation equations as explained in the previous 
paragraph. The important fact is that it gives the correct entropy ([7]), and also the correct marginal probabilities. 
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The satisfiable threshold in the locked problems is then computed as the average connectivity Is at which the replica 
symmetric entropy ([7]) decreases to zero [351] . s{ls) — 0. This is the first of many quantities in the locked problems 
which can be computed with much smaller effort then in the non-locked problems. The condensed phase, where 
the space of solutions is dominated by a finite number of clusters does not exist in the locked problems, and the 
condensation transition coincides with the satisfiability threshold. 




2 is average degree 

FIG. 3: Sketch of tlie pliase diagram in the locked problems. At low constraint density I < Id the solutions are separated by 
logarithmical distance but if any sort of noise is introduced this separation disappears. In the clustered phase Id <l < h the 
space of solutions is made of well separated single solutions. And eventually the satisfiability transition h comes beyond which 
solutions do not exist. 

Summary of the phase diagram — In contrast to the zoo of phase transitions in non-locked problems, in the 
locked problems we find only three phases, sketched in Fig.[3l the critical connectivity values are given in Table [H 

• The liquid phase, for connectivities I < Id' In this phase the small noise reconstruction is not possible. Equiv- 
alently the BP-like iterative fixed point of the survey propagation equations is not stable. If one considers the 
problem at a very small temperature, the IRSB equations have only the trivial solution, such a situation was 
observed previously in the perfect matching problem [isl . [ssj . We expect that the Hamming distance separation 
between solutions in this phase is only logarithmic. 

• The clustered phase, for Id <l < h'- In this phase the small noise reconstruction is possible. The BP-like iterative 
fixed point of the survey propagation equations is stable. The IRSB equations have a non-trivial solution even 
at an infinitesimal temperature. We expect that the solutions are separated by an extensive Hamming distance, 
in other words there is a gap in the weight enumerator function, just like in the XOR-SAT [39.] . This property 
is crucial in low density parity check codes j40| . 

• The unsatisfiable phase, for I > l^: no more solutions exist. 

All the other phase transitions we described for the non-locked problems have become very simple: The clustering 
transition coincides with the rigidity and freezing. And the satisfiability transition coincides with the condensation 
one. 

Finally we would like to mention the stability of the frozen IRSB solution towards more levels of replica symmetry 
breaking. In more geometrical terms, one should check whether the solutions do not tend to aggregate into clusters. 
This is called stability of type I in the literature [4l|, . In the locked problems it is equivalent to the finiteness 

of the spin glass susceptibility. In all the locked occupation problems we have studied, including all those in Tab. [H 
we have seen that the frozen IRSB solution is always stable in the satisfiable phase and sometimes becomes unstable 
at a point in the unsatisfiable phase. This means that our description of the satisfiable phase, and the determination 
of the thresholds Id and Ig, should be exact. 

C. The balanced LOPs 

We have seen that the phase diagram in the locked problems is much simpler than in the more studied constraint 
satisfaction problems as the K-SAT or coloring. In this section we describe a subclass of the locked problems — 
the so-called balanced locked problems — where the situation is even simpler. In particular, the clustering and the 
satisfiability threshold can be determined easily, and the second moment method can be used to prove rigorously the 
validity of this determination of the satisfiability threshold. This makes the balanced locked problems very interesting 
from the mathematical point of view. 

The balanced occupation problems are defined via the property that two random solutions are almost surely at 
Hamming distance N/2 + o{N). This property may of course depend on the connectivity distribution Q{1). A 
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A 


name 


Ls 


Cd 


Cs 


Id 


Is 


0100 


l-in-3 SAT 


3 


0.685(3) 


0.946(4) 


2.256(3) 


2.368(4) 


01000 


l-in-4 SAT 


3 


1.108(3) 


1.541(4) 


2.442(3) 


2.657(4) 


00100* 


2-in-4 SAT 


3 


1.256 


1.853 


2.513 


2.827 


01010* 


4-odd-PC 


5 


1.904 


3.594 


2.856 


4 


010000 


l-in-5 SAT 


3 


1.419(3) 


1.982(6) 


2.594(3) 


2.901(6) 


001000 


2-m-5 SAT 


4 


1.604(3) 


2.439(6) 


2.690(3) 


3.180(6) 


010100 


l-or-3-in-5 SAT 


5 


2.261(3) 


4.482(6) 


3.068(3) 


4.724(6) 


010010 


l-or-4-in-5 SAT 


4 


1.035(3) 


2.399(6) 


2.408(3) 


3.155(6) 


0100000 


l-in-6 SAT 


3 


1.666(3) 


2.332(4) 


2.723(3) 


3.113(4) 


0101000 


l-or-3-in-6 SAT 


6 


2.519(3) 


5.123(6) 


3.232(3) 


5.285(6) 


0100100 


l-or-4-in-6 SAT 


4 


1.646(3) 


3.366(6) 


2.712(3) 


3.827(6) 


0100010 


l-or-5-in-6 SAT 


4 


1.594(3) 


2.404(6) 


2.685(3) 


3.158(6) 


0010000 


2-in-6 SAT 


4 


1.868(3) 


2.885(4) 


2.835(3) 


3.479(4) 


0010100* 


2-or-4-in-6 SAT 


6 


2.561 


5.349 


3.260 


5.489 


0001000* 


3-in-6 SAT 


4 


1.904 


3.023 


2.856 


3.576 


0101010* 


6-odd-PC 


7 


2.660 


5.903 


3.325 


6 



TABLE I: The locked cases of the occupation CSPs for K < 6. In the regular graphs ensemble the space of solutions is 
clustered for L > Ld = S, and the problem is unsatisfiable for L > La. The values Cd and Cs are the critical parameters of 
the truncated Poissonian ensemble ([l|, the corresponding average connectivities Id and h are given via eq. ([2]). In all there 
problems the replica symmetric solution is stable at least up to the satisfiability threshold. The balanced cases are marked 
as *, their dynamical threshold follows from (|29|l . and their satisfiability threshold can be computed from the second moment 
method. 



necessary condition for the problem to be balanced it that the vector A be palindromic, meaning that Ar = Ak-t- 
But not all the palindromic problems are balanced, the simplest such example is the l-or-4-in-5 SAT, A = 010010, 
where the symmetry is spontaneously broken in the same way as in a ferromagnetic Ising model. 

As we argued in the previous section in the locked problems the replica symmetric approach (BP) gives the exact 
marginal probabilities and total entropy. Therefore a problem is balanced if and only if the iterative fixed point of 
the BP equations (|5aj|5bp is such that all the beliefs are equal to 1/2. 

We do not know of any simpler general rule to decide if a problem is balanced. For K < 12, there is no exception 
to the following empirical rule: all the problems which can be obtained from a Fibonacci-like recursion 

OAkO = Ak+2 OIAkIO = Ak+4 (26) 

from A2 — DIG or A4 — 01010 are balanced in their satisfiable phase. There are, however, other balanced locked 
problems which cannot be obtained this way, the simplest example is A = 0001001000. 

Clustering threshold in the balanced LOPs: The clustering threshold is given by the small noise reconstruc- 
tion, i.e. by the stability of the naive reconstruction procedure as explained in Sec. IIVBI In balanced LOPs, the 
messages are symmetric, ipo — tpi — 1/2, and thus also the probability for the root variable to be uniquely determined 
by the leaves is independent of the value which has been broadcast: ^J■o — ^J■l = ^J■■ For a graph ensemble with excess 
degree distribution q{l), one can write explicitly the self-consistency condition on /i: 



5A f-^ \rl^ 



1 - ^9(0(1 -M)' 



k — s 



.1=0 



(27) 



where k ^ K — 1, and gA = X)r=o ^yi-r+i,i{r) + Sr=o ^Ar,i{t)- ensemble of graphs with truncated Poissonian 

degree distribution of coefficient c we derive from Q 



r=0 



s=0 



r\ /I -e-=^ 
sj V 1 - 



1 -e- 



(28) 
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The clustering threshold is defined as the value of c where the fixed point /i = 1 becomes unstable. One gets: 

These values are summarized in Table U where the balanced locked problems are marked by a 

Satisfiability threshold in the balanced LOPs: For the balanced locked problems the replica symmetric 
entropy is given by: 



Ssym(0 = log 2+-^ log 



r=0 



(30) 



where I is the average degree of variables. Notice the simple form of this entropy: in the balanced locked problems 
each added constraint destroys a fraction of solutions exactly equal to the fraction of configurations that satisfy a 
single constraint. The satisfiability threshold is then given by the point Is where this entropy is zero. 

Second moment method in the balanced LOPs: In all the balanced LOPs that we have considered we found 
numerically that the second moment entropy, (|20p . is exactly twice the annealed entropy (|16p . 2sann = ■S2nd- A hint 
that this may happen comes from the following observations: 

• The annealed entropy (jl6p has a stationary point ai t — 1/2 {u — 1, x — 1). At this stationary the entropy 
evaluates to ([50)1 . 

• The second moment entropy (j20p has a stationary point at — ty — — l/A [ux ^ Uy = Uz ^ 1, x = y = z = 
1). At this stationary point the second moment entropy evaluates to twice the annealed entropy (|30[) . This can 
be seen using the Vandermonde's combinatorial identity 



We checked numerically that in the balanced LOPs the global maxima of Sann and S2nd is always given by these 
stationary points (the second moment entropy has another stationary point at — \/2,ty ^ = or = 0,ty = 
tz — 1/2, but at this point it is equal to the first moment entropy aX t — 1/2). On the contrary, in the non-locked or 
non-balanced problems we always found another competing maximum. 

If one accepted the result 2sann = S2nd^ and made the reasonable assumption that the satisfiability threshold is 
sharp, then Chebyshev's inequality p9| would prove the correctness of the satisfiability threshold computed from (|30p . 
Therefore the full class of balanced LOPs is a candidate for a rigorous mathematical determination of the satisfiability 
threshold. This would be quite interesting, as it would noticeably enlarge the list of problems where the threshold is 
known rigorously (so far only a handful of sparse NP-complete CSPs are in this category : the 1-in-A' SAT jlTl. |44|. 
the 2 + p-SAT and the (3,4)-UE-CSP [H]). 

Let us summarize qualitatively what are the main features of the balanced locked occupation problems that make 
the fluctuations of the number of solutions so small that the second moment method presumably gives the exact 
satisfiability threshold. 

• Balancing — It is well known that the second moment method works better if most of the weight is on the 
most numerous configurations (that is the half-filling ones). In the K-SAT pro blem several reweighting schemes 
were introduced in order to improve the second moment lower bounds [Tol. l32j|. This is also the reason why the 
second moment bound is much sharper in the balanced NAE-SAT (bicoloring) than in the K-SAT [l9| . 

• Reducing fluctuations in the connectivity — Naturally, reducing the fluctuations of the variables connectivity 
reduces the fluctuation of the number of solutions. Our work shows that the necessary step is not to have leaves. 
Fluctuating higher degrees do not really pose a problem. 

• Locked nature of the problem — And finally the key point is the locked structure of the problem. It was 
remarked in Q that the clusters-related quantities fluctuate much less than the solutions-related ones. Thus 
the fact that clusters do not have a fluctuating size, but size 1 is the crucial property needed to make the second 
moment method sharp. This is exactly what happens in the XOR-SAT problem, where the second moment 
becomes exact when it is restricted to the 'core' of the graph [l^, EE] 
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V. NUMERICAL STUDIES 



We shall show in this section that the LOPs, in their whole clustered phase, seem to be very hard from the 
algorithmic point of view. We shall illustrate this by testing and analyzing the performance of some of the best 
algorithms developed for random 3-satisfiability, the canonical hard constraint satisfaction problem. 

Our first study uses a complete algorithm and shows that, like in other problems such as satisfiability and coloring, 
the hardest instances are found in the neighborhood of the satisfiability transition. We then turn to incomplete 
algorithms, which are aimed at finding a SAT configuration when it exists. 

The best performance for incomplete algorithms is nowadays attributed to the survey propagation inspired deci- 
mation I47II and the survey propagation inspired reinforcement '48'] . In the random 3-SAT problem both of these 
algorithms were reported to work in linear time (or at most log-linear time) up to a constraint density about a — A^flhl^ 
to be compared with the satisfiability threshold as — 4.267 and the clustering transition ad — 3.86. 

As we saw in Sec. lIVBl in the LOPs, the survey propagation algorithm has no advantage over the belief propagation 
algorithm. We thus study the performance of the BP inspired decimation and reinforcement. The conclusions are as 
follows: the BP decimation fails in the LOPs even at very low connectivities; the BP reinforcement works in linear 
time in the non-clustered phase but fails in the clustered phase. 



A. Exhaustive search results 



One way to solve a LOP is to transform it into a conjunctive normal form (CNF), and use some of the open source 
complete solvers of the satisfiability problem. We have done such a study for the l-or-3-in-5 SAT problem. We have 
generated random instances of this problem from a truncated Poisson ensemble, with M constraints. Each instance 
has been transformed into a satisfiability formula by mapping every constraint into X]^o '^'4i-,o(^) CNF clauses: for 
every constraint oiK variables, one creates as many CNF clauses, out of the 2^ possible clauses, as there are forbidden 
configurations. We have applied a branch-and-bound based open-source SAT solver called MiniSat 1 . 14 [4§| to test 
the satisfiability and to compute the running time needed by this algorithm to decide the satisfiability. 
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FIG. 4: Left: Probability that a random instance of the l-or-2-in-5 SAT is satisfiable, versus the average degree. The probability 
is computed from 500 instances generated from the truncated Poisson ensemble. The vertical line shows the analytical prediction 
of the value of the satisfiability transition. Right: Median over the same 500 instances of the CPU running time of the complete 
algorithm MiniSat 1 . 14 (we have subtracted 0.0012 seconds from the CPU time, as this is approximately where is extrapolates 
for small average degree and zero system size). Alternatively one could plot the number of backtracking steps, which has a 
qualitatively identical behavior. 



Figure m left, shows the probability that an instance is satisfiable, plotted versus the average degree. It displays 
the typical behavior of a phase transition rounded by finite size effects. Figure 21 right, shows the median value of 
the CPU time which was used to solve an instance of the decision problem on a 2GHz MacBook laptop (note the 
logarithmic scale), plotted versus the average degree. The hardest instances appear around the satisfiability threshold 
Is^ and the time needed by the algorithm in this region clearly grows exponentially with the size. Hard satisfiable 
instances start to appear around I Id, although it is difficult to assert from this data where the exponential behavior 
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really starts. For larger system sizes it seems that the exponential behaviour starts way below the dynamical threshold 
Id- 

The data shows the same qualitative behavior as has been found in similar studies of satisfiability, with the difference 
that the relative width, {Ig — Id) /Is, of the clustered phase is larger in this case than it is in the K — 3 or K — 4 
satisfiability problems. The existence of LOPs with such a broad clustered phase is an appealing feature for numerical 
studies. In the following sections of this paper we argue that in the locked problems the easy-hard algorithmic 
threshold for the best-known incomplete solvers coincides with the clustering transition l^- 



B. Decimation fails in LOPs 



In BP inspired decimation one uses the knowledge of the marginal probabilities estimated from BP in order to 
identify the most biased variable, fix it to its most probable value, and reduce the problem. Such an algorithm usually 
works well even in the clustered region (for performance in K-SAT and coloring see [1, [l^l)- In the locked occupation 
problems the BP decimation fails badly. For example in the l-or-3-in-5 SAT problem, on the truncated Poisson graphs 
with M = 2 - 10** constraints, the probability of success is about 25% at I = 2, and less than 5% at I = 2.3, way below 
the clustering threshold Id — 3.07. Interestingly, the precursors of the failure of the BP decimation algorithm observed 
for instance in graph coloring are not present in the locked problems. In particular the BP equations converge during 
all the process and the normalizations in the BP equations (|5all5bp stay finite. 

Although we do not know how to analyze directly the BP decimation process, the mechanisms explaining the 
failure of the decimation strategy can be understood using the approach of jSOj . The idea is to analyze an idealized 
decimation process, where the variable to be fixed is chosen uniformly at random and its value is chosen according 
to its exact marginal probability. If its value is chosen according to the BP marginal we speak about the uniform 
BP decimation. If BP would give a fair approximation to the exact marginal throughout the decimation process, the 
uniform BP decimation should be equivalent to the ideal decimation. In the ideal decimation, the reduced problem 
obtained after 9N steps is statistically equivalent to the reduced problem created by choosing a solution uniformly at 
random and revealing a fraction 9 of its variables, which we now analyze, following the lines of (soj . 

Given an instance of the CSP, consider a solution taken uniformly at random and reveal the value of each variable 
with probability 6. Denote $ the fraction of variables which either have been revealed or are directly implied by the 
revealed ones. We can compute $(^) using the replica symmetric cavity method (which is correct in the satisfiable 
phase of locked problems) as follows. 

Denote by 0*^'' the probability that a variable i is implied conditioned on the value s of the variable i and on the 
absence of the edge (ib) ; denote by Qg^^ the probability that constraint a implies variable i to be s conditioned on: 
(1) variable i takes the value s in the solution we chose, (2) variable i was not revealed directly and (3) the edge (ai) 
is absent. Then 0!,^^ is given by: 



9 +(1-6) 



1- n (1-9° 



(32) 



meaning that the variable i was either revealed or not, and if not it is implied if at least one of the incoming constraints 
implies it. We shall write the expression for only for occupation problems on random regular graphs where the 
replica symmetric equation is factorized. Then and 0!,—*'' are independent of a,b,i: q^^^ = Qs and 0*^'' = 0s . 
The conditional probability qg is the ratio of the probability that variable i takes the value s and is implied by the 
constraint a on one hand, and the probability that variable i takes the value s on the other hand: 



qi 



qo 



^ E Sa.,oSa.^u1 n {y^irii^oY^''-'-^ £ h K'Wl-^l - 0i)^ , (33a) 



r=Q ^ ' s=0 ^ 

k /,\ So 



^E'^^^.i'^^^+-or)(V^i)''^(^o)'^'"''^E r j0i0r''"'^(l-0o)% (33b) 



r=0 ^ ^ s=0 ^ 



where I = L — 1, k = K — 1. The sum over r goes over all the possible numbers of I's being assigned on the incoming 
variables, and the numbers tpo, tpi are the cavity probabilities, solutions of the BP equations I|9all9b[) . The indices si, sq 
in the second sum of both equations are the largest possible but such that si < r, sq < K —1 — r, and X]s=o -^r-s — 0, 
J2l=o ^r+i+s = 0. The terms <^q~''0^^'*(1 — 01)" and <^i0o~'^~^(l — 00)^* are the probabilities that a sufficient number 
of incoming variables was revealed such that the out-coming variable is implied (not conditioned on its value). Z^'°^ is 
the normalization in (|9all9bp . 
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Iterations of eqs. P2II33|) with the initial condition (f> = 9 give us the fixed point for qQ,qi. The total probability 
that a variable is fixed is then computed as 



-0){a*i[1-(1 -9i)^]+Mo[l- (l-<Zo)^]} , 



(34) 



where fJ,o,fJ,i are the total BP marginals, /i^ — 4's lii'o + i'l)- Notice the analogy between eqs. p3bll33a|) and the 
equations for the naive reconstruction (jllbmia|l . 
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FIG. 5: Analytical and numerical study of the BP inspired uniform decimation. The number of variables which are directly 
implied, is plotted against the number of fixed variables 6 in two of the LOPs on the regular graph ensemble with 

connectivity L. 



In Fig. [5] we compare the function ^{9) obtained from the analytical study of ideal decimation (p4|) with the 
experimental performance of the uniform BP decimation. Before the failure of the decimation algorithm (when a 
contradiction is encountered) the two curves are in very good agreement. This study shows two different reasons for 
the failure of the algorithm: 

• Avalanche of direct implications - In some cases the function ^{9) has a discontinuity at a certain spinodal point 
9s (e.g. 9s ~ 0.46 at L = 3 for the l-or-3-in-5 SAT problem). For 9 < 9s, fixing one variable generates a finite 
number of direct implications. As the loops are of order logiV these implications never lead to a contradiction. 
At the spinodal point 9s, fixing one more variable generates an extensive avalanche of direct implications. Small 
(order 1 /N) errors in the previously used BP marginals may thus lead to a contradiction. This indeed happens 
in almost all the runs we have done. For more detailed discussion see [50| . 

• No more free variables - The second reason for the failure is specific to the locked problems, more precisely to 
the problems where (/jq — (t>i = I is solution of eqs. p2ll33p . In these cases it may happen that the function 
^{9) ^ 1 at some 9i < I (e.g. 9i « 0.73 at L = 4 for the l-or-3-in-5 SAT problem). In other words if we 
reveal a fraction 9 > 9i of variables from a random solution, the reduced problem will be compatible with only 
that given solution. Again, if there has been a little error in the previously fixed variables, the BP uniform 
decimation ends up in a contradiction. If on the contrary the function $(0) reaches the value 1 only for 9 = 1 
then the residual entropy is positive and there might be at each step some space to correct previous small errors, 
demonstrated on a non-locked problem in Fig. [51 

These two reasons of failure of the BP uniform decimation seem to be of quite different nature. But they have one 
property in common. As the point of failure is approached we observe that almost all the variables which are being 
fixed were already implied. The same sign of failure can be observed also in the maximal BP decimation. In Fig. [S] we 
compare the two procedures. On the a;-axes we plot the number of variables which could have taken both the values 
just before they were fixed. On the y-axes the number of variables which could take only one value before they were 
fixed plus the number of implied variables. The failure of both the versions of the BP decimation algorithms is then 
related to the divergence of the derivative of the function y{x). 
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3-bicoloring 0110 l-or-3-in-5 SAT, 010100 




variables fixed, f) variables not implied before fixed 



FIG. 6: Left: For comparison, the BP uniform decimation works well on the non-locked problems, the example is for bicoloring. 
Right: Comparison of the uniform BP decimation with the maximal BP decimation. The number of variables which are directly 
implied or were directly implied before being fixed is plotted against the number of variables which were free just before being 
fixed. The behavior of the two decimation strategies is similar. The divergence of the derivative of this function marks the 
point of failure. 



C. The BP reinforcement algorithm 

BP reinforcement is currently the most efficient way of using the BP equations in a solver. It was originally 
introduced in 48], and has also been used in [U, [5l|. The main idea is to add an 'external bias' which biases the 
variable i in the direction of the marginal probability computed from the BP messages. This modifies BP eq. (|5bp to 

{sj} jeda-i 

x:r = ^A^:. n ^'r> (35b) 

We remind that the belief on variable i (the BP estimate of its marginal) xl j without taking into account the bias 
^, is given by eq. 

We tried several implementations of how the external bias /x* . is updated and found the best performance for the 
following one 

Ml = TT, fil = l-TT, if xo > Xl , (36a) 
Ml = 1 - TT, Mo = TT, if xo < xl , (36b) 

where < tt < 1/2 is a parameter which needs to be optimized. In the iterative update of the BP reinforcement, the 
external bias is not updated at every BP iteration, but only with probability 

p(t) = l-(l + i)-^. (37) 

where t is the time step, and 7 a parameter to be optimized. The pseudocode of the algorithm is then as follows. 

BP-REINFORCEMENT(r, 7, 7r) 

1 Initialize ^l. and V's"" randomly; 

2 i^O; 

3 Compute the current configuration r.^ — argmax^ . ^ l. ; 

4 repeat Make one sweep of the BP iterations P5ai35b|) : 

5 update every bias fil. with probability p{T) according to p6aj|36bp : 

6 Update r.^ = argmax^ . /i* . ; 
7 

8 until {r} is a solution or t > T; 
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This algorithm depends on two empirical parameters, 7 and /i. We generally use 7 
bias strength tt is crucial. Empirically we observed three different regimes: 



0.1. The optimization of the 



(a) TTBP-iikc < TT < 0.5: When the bias is weak, BP-Reinforcement converges very fast to a BP-like fixed point, 
the values of the local fields do not point towards any solution. On contrary many constraints are violated by 
the final configuration {r,}. 

(b) TTconv < TT < TTBp-iiko: BP-Reinforcement converges to a solution {r j. 

(c) < TT < TTconv: Whcu the bias is too strong, BP-Reinforcement does not converge. And many constraints 
are violated by the configuration {n} which is reached after T,nax steps. 

When the constraint density in the CSP is large the regime (b) disappears and TTconv = TTBp-ukc- Clearly, the goal 
is to find TTconv < TT < TTBP-iikc- The point TTBP-iike is very easy to find, because for larger tt the convergence of 
BP-Reinforcement to a BP-like fixed point takes place in just a few sweeps. Thus in all the runs we chose tt to 
be just below TTBp-iike- The value of tt chosen in this way does not seem to depend on the size of the system, but it 
depends slightly on the constraint density. 
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FIG. 7: Performance of the BP reinforcement on two of the locked occupation problems. Probability of success versus average 
connectivity. Left: A = 010100, the optimal parameters: 7 = 0.1, tt = 0.28 for 2.79 < I < 2.95, tt = 0.30 for 2.97 < I < 3.13, 
TT — 0.31 for c = 3.15. Right: A = 010010 with 7 — 0.1, tt — 0.34. The different curves are for two different system sizes and 
two different maximal running times. The algorithm performs well only up to a connectivity close to the clustering transition 
{Id = 3.07 resp. Id ~ 2.41 to be compared with the satisfiability threshold Is = 4.72 resp. h = 3.16). Qualitatively similar 
result were observed for all the other locked occupation problems we studied. 



We tested the BP-Reinforcement algorithm on the locked occupation CSPs, the results are shown in Fig.[71 The 
fraction of successful runs on different system sizes and for different maximal running times is plotted as a function of 
the mean variable connectivity. Our data suggest that the algorithm is successful only in the liquid phase, and fails 
in the clustered (that is also frozen) region. Similar results can be obtained with other algorithms; for instance the 
performance of stochastic local search was reported in [l3| . 

The clustered phase is thus extremely hard and instances of the locked problems can serve as benchmarks for new 
solvers. In fact, some of the hardest benchmarks of the K-satisfiability problem are based on a well known LOP, 
XOR-S AT (with some additional non- linear function nodes which rule out the Gaussian elimination solvers) (s^ . |53| . 

In the non-locked problems the very same implementation of the BP reinforcement is able to find solutions inside 
the clustered region. Fig. [5] left shows the performance for A = 011010. This is in qualitative agreement with results 
for the K-SAT [48], coloring [i,[ll| or bicoloring problems [21]. 

It is not known how one can characterize from a geometrical point of view the connectivity threshold where BP 
reinforcement algorithms stop to be efficient in the non-locked problems. It has been found in 21] that even the 
rigid phase where almost all solutions are frozen may be algorithmically easy. Fig. [H] confirms this statement for the 
problem A = 0110100 on the regular ensemble with L — 8. The ratio of success of the BP reinforcement (with 3 
restarts) is close to one, and basically independent of system size, while it can be seen from (|TT]) that this problem is 
in the rigid phase. On the other hand the fraction of found solutions which are frozen (have a nontrivial whitening 
core [34j, i54i] ) goes to zero as the system size is growing, in agreement with the results of pT] . Thus the question of 
where is the easy/hard threshold in the non- locked problems remains open. 
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FIG. 8: Left: Performance of the BP reinforcement on one of the non-locked problems, A — 010110. Parameters; 7 = 0.1; 
TT = 0.40 for 7.0 < I < 7.8, tt = 0.42 for 7.9 < I < 8.0, tt = 0.44 for 8.1 < 7 < 8.4. The implementation of the algorithm is the 
same as for the locked problems in Fig. [T] Here solutions are found up to about a half of the clustered region, Id = 7.40. The 
condensation Ic = 8.78 and the satisfiability Is — 8.86 transition are also marked. Right: The A — 0110100 at regular graphs of 
L = 8 is in the rigid phase, that is almost all solutions belong to frozen clusters. Yet the BP reinforcement (7 = 0.1, tt = 0.36) 
finds a solution almost surely (after 3 restarts) - the red curve. The blue curve gives a fraction of how many of the solutions 
found belonged to a frozen cluster. We see that asymptotically we never find the frozen solutions. 



VI. CONCLUSION 



We studied the class of occupation CSPs on which we illustrated the difference between locked and non-locked CSPs. 
The point-like nature of clusters in LOPs is responsible for all of these differences. Our finding may be summarized as: 
"Locked problems are extremely simple and extremely hard." The simplicity comes at the level of the phase diagram, 
which can be computed by the cavity method much more easily that in the general CSP. In certain cases some non- 
trivial quantities are probably amenable to a rigorous study along the lines that we sketched - as for example the 
satisfiability threshold in the balanced locked problems. The hardness is algorithmic, some algorithms - as the BP 
decimation - fail completely, and even the best known algorithms are not able to find solutions in the clustered phase 
of the locked problems. Their simple description and algorithmic hardness makes the locked problems challenging for 
developments of new algorithms as well as for better theoretical understanding on the origin of hardness. 

There are several clear directions in which this work should be extended. The planted ensembles of LOPs should 
be studied in order to provide hard benchmarks where the existence of a solution would be guaranteed. On the 
mathematical side the rigorous proof of the second moment method giving the satisfiability threshold in the balanced 
locked problems should be worked out. One may investigate if the location of the clustering threshold can be proven 
rigorously using the small noise reconstruction in the lines of (ssj , or bounding the weight enumerator function in the 
lines of [39|. Also it will be interesting to study (at least numerically) the dynamics at finite temperature, as it might 
provide further insight into the connection between the dynamics of algorithms and structure of solutions. Finally 
the distance properties between solutions in locked CSPs makes them interesting candidates for the development of 
nonlinear error correcting codes or compression schemes. 
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